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GENERALIZED PRECONDITIONED LOCALLY HARMONIC 
RESIDUAL METHOD FOR NON-HERMITIAN EIGENPROBLEMS * 

EUGENE VECHARYNSKL , CHAO YANG^, AND FEI XUE* 

Abstract. We introduce the Generalized Preconditioned Locally Harmonic Residual (GPLHR) 
method for solving standard and generalized non-Hermitian eigenproblems. The method is particu¬ 
larly useful for computing a subset of eigenvalues, and their eigen- or Schur vectors, closest to a given 
shift. The proposed method is based on block iterations and can take advantage of a preconditioner 
if it is available. It does not need to perform exact shift-and-invert transformation. Standard and 
generalized eigenproblems are handled in a unified framework. Our numerical experiments demon¬ 
strate that GPLHR is generally more robust and efficient than existing methods, especially if the 
available memory is limited. 
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1. Introduction. Large non-Hermitian eigenproblems arise in a variety of im¬ 
portant applications, including resonant state calculation [3, 16, 33] or excited state 
analysis for equation-of-motion coupled-cluster (EOM-CC) approaches [14, 15] in 
quantum chemistry, linear stability analysis of the Navier-Stokes equation in fluid dy¬ 
namics [5, 6], crystal growth simulation [21, 22], magnetohydrodynamics [18], power 
systems design [17], and many others; see, e.g., [45] for more examples. 

In their most general form, these problems can be written as 

(1.1) Ax = XBx, 

where A and B are general square matrices. We are particularly interested in the case 
in which both A and B are very large and sparse, or available only implicitly through 
a matrix-vector multiplication procedure. If B is the identity matrix, then (1.1) 
becomes a standard eigenproblem. 

The spectrum of (1.1), denoted by A(A, R), is given by a set of numbers A G C 
that make A — XB singular. The value of A can be infinity in the case of a singular B. 
Given a scalar u G C, which we refer to as a shift, we seek to find a subset of eigen¬ 
values A G A(A, B) that are closest to a, and their associated (right) eigenvectors x. 
These eigenvalues can be either extreme eigenvalues (e.g., eigenvalues with the largest 
magnitude) or interior eigenvalues that are inside the convex hull of the spectrum. 

Our focus in this paper is on algorithms for computing interior eigenvalues and 
their corresponding eigenvectors of a non-Hermitian pencil. It is well known that 
these eigenpairs are often difficult to compute in practice. Traditional methods, such 
as the inverse subspace iteration or variants of the shift-invert Arnoldi algorithm, 
see, e.g., [2, 36], rely on using spectral transformations that require performing LU 
factorizations of A — aB and computing solutions to triangular linear systems. Such 
an approach can be prohibitively expensive, especially when the problem size is large 
and the LU factors of A — aB are not sparse. 
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There has been some effort in recent years to develop methods that are factor¬ 
ization free. Examples of such methods include the inexact inverse subspace itera¬ 
tion [34, 48] and inexact shift-invert Arnoldi methods [8, 49] in which linear systems 
of the form {A — aB)w = r are solved iteratively. While these schemes can be con¬ 
siderably less expensive per iteration, the overall convergence of these methods is 
less predictable. There is often a tradeoff between the accuracy of the solution to 
linear systems and the convergence rate of the subspace or Arnoldi iterations. Set¬ 
ting an appropriate convergence criterion for an iterative solution of {A — aB)w = r 
is not straightforward. 

Another class of factorization-free methods include the Generalized Davidson 
(GD) method [27, 28] and the Jacobi-Davidson (JD) methods [7]. The GD meth¬ 
ods generally do not rely on solution of linear systems. The JD style QR and QZ 
algorithms (JDQR and JDQZ) presented in [7] can be viewed as preconditioned eigen- 
solvers in which a Newton-like correction equation is solved approximately by a pre¬ 
conditioned iterative solver. 

The GD method can be easily extended into a block method in which several 
eigenpairs can be approximated simultaneously [38, 50]. Block GD is widely used in 
quantum chemistry. A block GD method is particularly well suited for modern high 
performance computers with a large number of processing units. This is because in 
a block method several (sparse) matrix vector multiplications, which often constitute 
the major cost of the algorithm, can be carried out simultaneously. Furthermore, one 
can easily implement data blocking, exploit data reuse and take advantage of BLAS3 
in a block method. These techniques can lead to significant speedups on modern high 
performance computers as reported in [1]. However, the convergence of the block 
GD method depends to a large degree on the effectiveness of the preconditioner. 
In quantum chemistry applications, the matrix A is often diagonal dominant. The 
diagonal part of the matrix can be used to construct an effective preconditioner. In 
other applications, it is less clear how to construct a good preconditioner when A is 
not diagonal dominant [46]. 

The JDQR and JDQZ methods tend to be more robust with respect to the choice 
of preconditioners. However, extending JDQR and JDQZ to block methods is not 
straightforward. Because JDQR and JDQZ methods compute one eigenpair at a 
time, concurrency can only be exploited within a single matrix-vector multiplication 
procedure, which limits its parallel scalability to a large number of processors. 

In this paper, we present a new family of methods for solving non-Hermitian 
eigenproblems, called the Generalized Preconditioned Locally Harmonic Residual 
(GPLHR) methods. The proposed scheme is a block method that constructs approx¬ 
imations to a partial (generalized) Schur form of the matrix pair (A, B) associated 
with (1.1) iteratively. It does not require performing an LU factorization of A — aB. 
It allows a preconditioner to be used to construct a search space from which approx¬ 
imations to the Schur vectors can be extracted. We demonstrate that the GPLHR 
methods are generally more robust and efficient than JDQR/JDQZ as well as the 
block GD method when the same preconditioner is used, and when dimension of the 
search spaces are comparable. In addition, because GPLHR is a block method, it is 
well suited for high performance computers that consist of many cores or processors. 

The GPLHR algorithm is a generalization of the recently introduced Precondi¬ 
tioned Locally Harmonic Residual (PLHR) method for computing interior eigenpairs 
of Hermitian eigenproblems [46] to the non-Hermitian case (hence the name). The 
GPLHR scheme can also be viewed as an extension of the well known LOBPCG [19] 
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method and block inverse-free preconditioned (BIFP) Krylov subspace methods [9, 32] 
for computing extreme eigenpairs of Hermitian problems. At each step, all these meth¬ 
ods construct trial subspaces of a fixed dimension and use them to extract the desired 
approximations by means of a subspace projection. The key difference is that GPLHR 
systematically uses a harmonic Rayleigh-Ritz procedure [26, 29], which allows for a 
robust eigen- or Schur pair extraction independent of the location of the targeted 
eigenvalues. Moreover, the proposed method explicitly utilizes properly chosen pro¬ 
jectors to stabilize convergence, and is capable of performing iterations using the 
approximations to Schur vectors instead of potentially ill-conditioned eigenvectors. 

This paper is organized as follows. In Section 2, we review standard eigenvalue- 
revealing decompositions for non-Hermitian eigenproblems and propose a new gener¬ 
alized Schur form which emphasizes only the right Schur vectors. This form is then 
used to derive the GPLHR algorithms for standard and generalized eigenproblems in 
Section 3. The preconditioning options are discussed in Section 4. Section 5 presents 
numerical results. Our conclusions can be found in Section 6. Appendix A briefly 
describes the approximate eigenbasis based GPLHR iteration (GPLHR-EIG). 

2. Eigenvalue-revealing decompositions. For non-Hermitian eigenproblems, 
it is common (see, e.g., [24, 36, 44]) to replace (1.1) by an equivalent problem 

(2.1) /3Ax = aBx, 

where the pair called a generalized eigenvalue, defines an eigenvalue A = a//? 

of the matrix pair {A,B). This formulation is advantageous from the numerical 
perspective. For example, it allows revealing the infinite or indeterminate eigenvalues 
of (1.1). The former corresponds to the case where B is singular and /3 = 0, a ^ 0, 
whereas the latter is indicated by a = /3 = 0 and takes place if both A and B are 
singular with a non-trivial intersection of their null spaces. Additionally, the separate 
treatment of a and /3 allows one to handle situations where either of the quantities 
is close to zero, which leads to underflow or overflow for A = q;// 3. Note that a 
generalized eigenvalue (a,/3) is invariant with respect to multiplication by a scalar, 
i.e., for any nonzero t S C, (ta,tl3) is also a generalized eigenvalue. In order to fix a 
representative pair, a normalizing condition should be imposed, e.g., |a|^ -I- |/3|^ = 1. 

In this paper, we assume that the pair {A,B) is regular, i.e., <lei{j3A + aB) is not 
identically zero for all a and /3. The violation of this assumption gives a singular pair, 
which corresponds to an ill-posed problem. In this case, a regularization procedure 
should be applied to obtain a nearby regular pair, e.g., [2, Chapter 8.7], for which the 
eigenproblem is solved. In most cases, however, {A, B) is regular, even when A or 
B or both are singular as long as the intersection of their null spaces is zero. Thus, 
regular pairs can have infinite eigenvalues, whereas the possibility of indeterminate 
eigenvalues is excluded. 

2.1. Partial eigenvalue decomposition and generalized Schur form. Let 

us assume that the targeted eigenvalues of (1.1) are non-deficient, i.e., their alge¬ 
braic and geometric multiplicities coincide. Then, given formulation (2.1), the partial 
eigenvalue decomposition of {A,B) can be written in the form 

(2.2) AXKb = BXKa, 

where X G is a matrix whose columns Xj correspond to the k eigenvectors 

associated with the wanted eigenvalues Xj. The k-hy-k matrices A^ and Kb are 
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diagonal, with the diagonal entries given by and j3j, respectively. In our context, 
it is assumed that \j = otj/j3j are the eigenvalues closest to a. 

Since, by our assumption, A^ ’s are non-deficient, there exists k associated linearly 
independent eigenvectors, so that X is full-rank and (2.2) is well defined. For a vast 
majority of applications, the partial eigenvalue decomposition (2.2) does exist, i.e., 
the desired eigenvalues are non-deficient. Nevertheless, pursuing decomposition (2.2) 
is generally not recommended, as it aims at computing a basis of eigenvectors which 
can potentially be ill-conditioned. 

In order to circumvent the difficulties related to ill-conditioning or deficiency of 
the eigenbasis, we consider instead a partial generalized Schur decomposition 



QRa, 

QRb', 


(2.3) 


see, e.g., [7, 36, 43]. Here, V and Q are the n-hy-k matrices of the right and left 
generalized Schur vectors, respectively. The factors Ra and Rb are k-hy-k upper 
triangular matrices, such that the ratios Xj = RaU, j)/R bU, j) of their diagonal 
entries correspond to the desired eigenvalues. 

The advantage of the partial Schur form (2.3) is that it is dehned for any {A,B) 
and both V and Q are orthonormal, so that they can be computed in a numerically 
stable manner. The matrix V then gives an orthonormal basis of the invariant sub¬ 
space associated with the eigenvalues of interest. If individual eigenvectors are needed, 
the right generalized Schur vectors can be transformed into eigenvector block X by the 
means of the standard Rayleigh-Ritz procedure for the pair {A,B), performed over 
the subspace spanned by the columns of V. Similarly, Q contains an orthonormal 
basis of the left invariant subspace and can be used to retrieve the left eigenvectors. 

2.2. The “Q-free” partial generalized Schur form. We start by addressing 
the question of whether it is possible to eliminate Q from decomposition (2.3) and ob¬ 
tain an eigenvalue-revealing Schur type form that contains only the right Schur vectors 
V and triangular factors. Such a form would provide a numerically stable alternative 
to the eigenvalue decomposition (2.2). It would also give an opportunity to handle the 
standard and generalized eigenproblems in a uniform manner. Most importantly, it 
would allow us to define a numerical scheme that emphasizes computation only of the 
generalized right Schur vectors, which are often the ones needed in a real application. 
The left Schur basis Q can be obtained as a by-product of computation. 

For example, if Rb is nonsingular, then Q can be eliminated in a straightforward 
manner by multiplying the bottom equation in (2.3) by R^^ and substituting the 
resulting expression for Q into the top equation. This yields the desirable decom¬ 
position AV = BVMa, where Ma = R^^Ra is triangular with eigenvalues on the 
diagonal. Similarly, assuming that Ra is nonsingular, one can obtain AVMb = BV, 
where Mb = R'^^Rb is also triangular, with inverted eigenvalues on the diagonal. 

However, the above treatment involves inversions of Ra or Rb, and consequently 
could cause numerical instabilities whenever the triangular matrices to be inverted 
have very small diagonal entries. Moreover, in the presence of both zero and infinite 
eigenvalues, Ra and Rb are singular and cannot be inverted at all. Therefore, it 
would be ideal to construct a Q-free Schur form for any regular {A, B) that does not 
rely on the inverse of the triangular matrices Ra and Rb- 

In the rest of the section, we show that such a “Q-free” partial Schur form is 
possible. We start with the following lemma. 
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Lemma 2.1. Let Ra and Rb be upper triangular matrices, such that \RA{j,j) \ + 
7^ 0 far all 1 < j < fc. Then for any upper triangular Gi and G 2 , and 
any tj € C, such that 

\RaUg)\ < \RbUg)\ 

otherwise 


\RA{j,j)\ < \RB{j,j)\ . 

? 

otherwise 

the matrix G = RaGi + RbG 2 is upper triangular with G{j,j) = 1 for all 1 < j < k. 

Proof. The matrix C? is a combination of upper triangular matrices and, hence, is 
upper triangular. For each diagonal entry of G, we have G{j,j) = RaU, j)Gi{j, j) + 
RbU, j)d2{j, j)j where RaUG) and RbGG) do not equal zero simultaneously. It 
is then easy to check that for any Tj G C, (2.4) and (2.5) satisfy RA{jG)Gi{j, j) + 
RBUG)G 2 {j,j) = 1 for all 1 < j < A:. □ 

The following theorem gives the existence of a “Q-free” generalized Schur form. 
Theorem 2.2. For any regular pair {A,B) there exists a matrix V G with 

orthonormal columns and upper triangular matrices Ma,Mb G such that 

(2.6) AVMb = BVMa, 

and Xj = MA{j,j)/MB{j,j) are (possibly infinite) eigenvalues of {A,B). 

Proof We start from the generalized Schur decomposition (2.3). Let Gi and G 2 be 
upper triangular with diagonal elements defined by (2.4) and (2.5). Right-multiplying 
both sides of the top and bottom equalities in (2.3) by G\ and G 2 , respectively, and 
then summing up the results, gives an equivalent system 

AV = QRa, 

AVGi + BVG2 = QG, 

where 



(2.4) 


Gi(j,j) = <1 l-TjRB{j,j) 
RaUG) 


and 


(2.5) 


1 - TjRaGG) 
G2{jG) = { RbGG) 


G — RaGi -I- RbG2. 


Since {A, B) is a regular pair, RaGG) and RbGG) do not equal zero simultaneously 
for the same j. Thus, by Lemma 2.1, G is upper triangular with G{jG) = 1 for all j. 
In particular, the latter implies that the inverse of G exists. Hence, we can eliminate 
Q from (2.7). 

It follows from the bottom equality in (2.7) that Q = {AVGi + BVG 2 )G~^. 
Substituting it to the top identity gives AV = {AVGi + BVG 2 )G~^Ra, which im¬ 
plies (2.6) with 

( 2 . 8 ) Ma = G2G-^Ra, Mb = I -GiG-^Ra- 

Furthermore, since the diagonal entries of G are all ones, G~^{jG) = 1 for all j. Thus, 
from (2.8), the diagonal elements of Ma and Mb are related to those of Ra and Rb by 
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MaUJ) = G 2 {j,j)RA{j,j) and MbUJ) = 1 - Gi{j,j)RAij,j)- In particular, using 
definition (2.4) and (2.5) of G'i(j, j) and G 2 {j,j), if IRaUJ)] < IRbUJ)], we obtain 

(2.9) MaUJ) = MbUJ) = 1 - TjRAiJJ), 

RbU,j) 

which implies that MA{j,j)/^Bij,j) = RAij,j)lRB{j,j) — provided that tj is 
chosen such that 1 — RAUG)Tj 7^ 0- 
Similarly, if \RA{j,j)\ > \RB{j,j)\, 


( 2 - 10 ) MaUJ) = TjRA{j,j), MbUJ) = t^RbUJ)- 


Hence, by choosing tj ^ 0, we get MA{i,j)I^B{jG) = RA{j, j)IR bUG) = Aj- □ 

Given the traditional generalized Schur decomposition (2.3), Theorem 2.2 sug¬ 
gests a simple approach for obtaining the “Q-free” form (2.6). Namely, one starts 
with setting up the upper triangular matrices Gi, G 2 , and G defined according to 
Lemma 2.1, and then applies (2.8) to evaluate Ma and Mb- The right Schur basis 
V is the same in (2.3) and (2.6). Note that the definition (2.8) of the factors Ma 
and Mb does not rely on inverting Ra or Rb and only requires an inverse of a trian¬ 
gular matrix G which has all ones on the diagonal. Hence, it avoids inverting (nearly) 
singular triangular matrices. Additionally, the proposed formulas (2.4) and (2.5), as 
well as (2.9), always have the largest modulus number between i?^(j, j) and RbUG) 
in the denominator, which mitigates potential numerical issues introduced by the 
small diagonal elements of Ra and Rb- 

Clearly, the choice of Gi and G 2 in (2.4) and (2.5), and hence decomposition (2.6), 
is not unique, as it depends on the value of the parameter tj and allows freedom in the 
choice of entries above diagonal in Gi and G 2 - Therefore, in order to fix particular 
Ma and Mb, in practice, we choose Gi and G 2 to be diagonal and set Tj = 0 if 
\RA{j,j)\ < \RB{j,j)\ and Tj = 1, otherwise. This yields the diagonal matrices 


( 2 . 11 ) 


and 

( 2 . 12 ) 


Giij,j) = 


1 - RB{j,j) 
RaUJ) 


\RA{j,j)\ < \RB{j,j)\ 
otherwise 


G2{j,j) 


^/RB{j,j), \RA{j,j)\ < \RB{j,j)\ . 

1, otherwise ’ 


for which, by (2.9), MA{j,j) = RAij,j)/RB{j,j), MB{j,j) = 1 if |i?A(j,j)| < 
\RB{j,j)\, and, by (2.10), MA{j,j) = RA{j,j), MB{j,j) = RB{j,j), otherwise. Note 
that it is also possible to choose r,- that give Ma and Mb, such that |M^(j, j)p -|- 
= Ij bnt our implementation follows the simple formulas (2.11) and (2.12). 

3. The GPLHR algorithm for computing a partial Schur form. For Her- 
mitian problems, a class of powerful eigensolvers is based on preconditioned block it¬ 
erations; e.g., [19, 32, 46]. Such methods fit into a unified framework consisting of two 
key ingredients: generation of the trial (or search) subspace and extraction of the ap¬ 
proximate eigenvectors. Namely, given a number of eigenvector approximations and a 
preconditioner, at each step i, a low-dimensional trial subspace is constructed. If 
properly chosen, ZG> contains improved approximations to the wanted eigenvectors, 
which are extracted from the subspace using a suitable projection procedure and are 
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then used to start a new iteration. In this section, we extend this framework to the 
non-Hermitian case. 

The results of Section 2.2 are central to our derivations. In particular, instead of 
the standard partial generalized Schur decomposition (2.3), we focus on the “Q-free” 
form (2.6), and seek approximation of the right Schur vectors V and the associated 
triangular factors Ma and Mb- As we shall see, the approximation to the left Schur 
vectors Q, as well as to triangular matrices Ra and 7?^, can be easily obtained as a 
by product of the proposed scheme. 

3.1. Construction of the trial subspace. Let V^'^\ M^\ and Mg'^ be ap¬ 
proximations to V, Ma^ and Mb in (2.2) at an iteration i, such that has k 
orthonormal columns, and M\ and Mg are k-hy-k upper triangular. In order to 
define a trial subspace that can be used for searching a new approximation I/b+^), 
we adopt the following subspace orthogonal correction viewpoint. 

We are interested in constructing a subspace that provides a good represen¬ 
tation of the correction C^'‘\ such that 

(3.1) A(yW + + A^g'>) = B(CW -k C^^){Mf + A^^), 

where I/(d*c(i) = Q; and A^g> are corrections of the triangular factors M^'^ 

and Mg\ respectively. Once the “correction subspace” is determined, the trial 
subspace can be defined as the subspace sum = col{y^®^} -|-/C(®\ where coljy^®)} 
denotes the column space of 

After rearranging the terms in (3.1), we get 

- BC^^Mf = -{AV^'^^M^g^ - BV'^^Mf) + BVA^^ - AVA^g\ 

wherey = yW-fCW is the matrix of the exact right Schur vectors. Then, using (2.3), 
we arrive at the identity 

(3.2) AC'^^Mf - BC^^M^^^ = -{AV^^M^^^ - BV^^'>M^2’) + Q{RbA^2 “ RaA^b)^ 

where Q is the matrix of the exact left Schur vectors. Let Pq± = I — be 

an orthogonal projector onto coljQ*^®^}-*-, where Q*-®^ is a matrix whose orthonormal 
columns represent a basis of colj^sAl/^®^ -|- 5aBV^^^} for some scalars Sa,Sb G C, 
such that |5yi| -I- |i5_b| ^ 0. Then, applying Pq± to both sides of (3.2) and neglecting 
the high-order term (/ — Q^^'>Q^'^'>*)Q{RbA^^ — i?^A^^) gives the equation 

(3.3) iPQ±APv±)CM'^^ - {PQ^BPv±)CMf = -PQ^{AV^'^Mf - BV^'^Mf), 

whose solution C, constrained to satisfy I/l®)*^ = 0, provides an approximation to the 
desired correction C^®^ of (3.1). Here, Pyi^ = I — 1 /(®)h(®)* is an orthogonal projector 
onto coljH^®^}-*- and, hence, C = Py±C. 

Equation (3.3) represents a generalized Sylvester equation [39] and can be viewed 
as a block extension of the standard single vector Jacobi-Davidson (JD) correction 
equation [7, 40], where the right-hand side is given by the projected residual of prob¬ 
lem (2.6). Note that a connection between the subspace correction and solution of a 
Sylvester equation was also observed in [31], though not in the context of the general¬ 
ized Schur form computation. Thus, a possible option is to (approximately) solve (3.3) 
for a block correction C and then set /C^®) = col{C}. 
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However, with this approach, it is not straightforward to solve the matrix equa¬ 
tion (3.3) efficiently. Under certain assumptions, one can reduce (3.3) to the standard 
Sylvester equation, but this is generally expensive for large problems. 

Instead, we do not seek the solution of the generalized Sylvester equation, and 
use a different strategy to generate the correction subspace To this end, we 

consider (3.3) as an equation of the general form 

(3.4) L{C) = F, = 0, 

where L(C) = {PQrAPy±)CMf - {PQ±BPv±)CMf and F = -PQ±{AV^^Mf - 

Since L(-) is a linear operator, standard subspace projection techniques 
for solving linear systems, e.g., [10, 35], can be formally applied to (3.4). Thus, we 
can define a preconditioned Krylov-like subspace for (3.4). However, instead of using 
this subspace to solve (3.4), we place it directly in an eigensolver’s trial subspace. 

More precisely, let : coljQ^*^}-*- —> col{U^*^}'’‘ be a preconditioner for the 
operator L{-). Then, following the analogy with the Krylov subspace methods for 
(block) linear systems [10, 11, 35], one can expect that the solution C of (3.4) can be 
well represented in the preconditioned block Krylov subspace 

block span {ri(K),Ti {L{Tl{F)), ..., (TiL)'"(Ti(F)))} , 

generated by the operator Tl{L{-)) and the starting block Tl{F) G C"^^, which 
contains all blocks of the form ^i^q{TlLY{Tl{F))Gi, with S 
particular, this implies that each column of G should be searched in 

(3.5) =/C«+i 

where the blocks K = [1U(*\ ..., Sm^] represent what we call the Krylov- 

Arnoldi sequence, generated by a preconditioned Arnoldi-like procedure for prob¬ 
lem (3.4), or equivalently, for the generalized Sylvester equation (3.3). This procedure 
is stated in Algorithm 1, where the input parameter m determines the subspace size. 


Algorithm 1: Preconditioned block Arnold! type procedure for equation (3.4) 

Input: preconditioning operator Tl, the parameter m. 

Output: the Krylov-Arnoldi basis K = [W, Si,, Sm] 

1: 1^0; W ^ orth°- 
2: for I — 1 ^ m do 

3: Si^TL{L{Si.i)y, Si^ s,-w{w*Siy, 

4: for j — 1 ^ I — 1 do 

5: Si^ Si-Sj{s*Siy 

6: end for 

7: Si -s— orth(S'!); 

8: end for 

9: Return A = [W, S'!,..., 5^]. 

“Throughout, orth(y) denotes a procedure for orthonormalizing columns of V. 


Thus, given the subspace (3.5), capturing the orthogonal correction G, the eigen¬ 


solver’s trial subspace can be defined as coljU^*)} 


■ i e 


= col{U(*), , S'f ^ ,SY\..., S'W } 


(3.6) 
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Clearly, the Krylov-Arnoldi sequence vectors K, generated by Algorithm 1, represents 
a system of orthonormal columns that are orthogonal to due to the choice of the 
preconditioner : coljQ^*^}-*- —)■ col{y*^*^}"*~. Hence, the trial subspace in (3.6) 
is spanned by orthonormal vectors, which gives a stable basis. 

A desirable feature of the preconditioner Tl for problem (3.4) (or (3.3)) is that 
it should approximate the inverse of the operator A(-). One way to construct such 
an approximation is to replace the upper triangular matrices and in the 
expression for L(-) by diagonals cr^i/ and asl, respectively, where oaI'^b = t. This 
results in an approximation of L(-) that is given by the matrix (J — — 

crAS)(/-1/WcW*). Thus, one can choose the preconditioner as 

(3.7) Tl = (I- v‘'^V^>)T{I - 

where T « {asA — Throughout, we consider T a,s a preconditioner for eigen¬ 

value problem (1.1), which should be distinguished from the preconditioner Tl for 
the generalized Sylvester equation. Note that, in practice, assuming that ct is a finite 
shift that is different from an eigenvalue, we can set aA = cr and as = 1, so that 
T ^ (A — aB)~^ is an approximate shift-and-invert operator. 

Finally, given the definition (3.7) of the preconditioner Tl, we can derive explicit 
expressions for the blocks in the Krylov-Arnoldi sequence generated from Algorithm 1. 
It is easy to check that 

(3.8) VFW = orth((/ - V^^V^^*)T{I - Q^^*){AV'^^'^Mf - 


and, for I = . ,m, 


(3.9) 


= orth(P^'7^^(/-ITWlF(*)*)5P); 


where = IT^*) and is the orthogonal projector onto coll^^'^ ..., 


(3.10) 








W-sfl, 


Mi-l)* 

^ 1-2 


)...(/- 5 « 




We remark that the above construction of the trial subspace (3.6) is similar in 
spirit to the one used to devise the BIFP Krylov subspace methods in [9, 32] for 
computing extreme eigenpairs of Hermitian problems. The conceptual difference, 
however, is that we put the block correction equation (3.3) in the center stage, whereas 
the BIFP methods are based on Krylov(-like) subspaces delivered by a simultaneous 
solution of problems {A — ajB)w = r for k different shifts aj. In particular, this 
allows us to discover the projectors I — and I — in (3.7). These 

projectors are then blended into formulas (3.8)-(3.I0) that define the trial subspace. 
Their presence has a strong effect on eigensolver’s robustness, as we shall demonstrate 
in the numerical experiments of Section 5. 

3.2. The trial subspace for standard eigenvalue problem. If B = /, in¬ 
stead of the generalized form (2.3), we seek the standard partial Schur decomposition 


(3.11) AV = VR, 

where V G contains orthonormal Schur vectors and R G is upper trian¬ 

gular with the wanted eigenvalues of A on its diagonal. It is clear that (3.11) is a 
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special case of the “Q-free” form (2.6) with B = I, Ma = R, and Mb = I- There¬ 
fore, the derivation of the trial subspace, described in the previous section, is directly 
applicable here. 

In particular, let be an approximate Schur basis, and let and 

be the associated upper triangular matrices that approximate R and /, respectively. 
Then, following the arguments of Section 3.1, from (3.1) with B = I, we get 

(3.12) - i?A^^). 

This equality is the analogue to (3.2) for a standard eigenvalue problem. Here, in 
order to approximate the correction it is natural to apply the projector = 
j—yMyM* to both sides of (3.12) and neglect the term (J— (A^^—i?A^^). 
As a result, we obtain the equation 

(3.13) {Pv^APv^)CM^g^ - = -Pv^{AV^^ 

where the solution C, which is orthogonal to V^''\ approximates the desired Note 
that in contrast to the correction equation (3.3) for the generalized eigenvalue problem, 
which is based on two projectors Py^ and Pq-. (3.13) involves only one projector Py±. 
This is expected, as both and approximate the same vectors, V. 

Considering (3.13) as an equation of the general form (3.4), where L{C) = 
{Py±APy±)CM^g'’ - and F = -Py±{AV^'>Mf - apply m 

steps of Algorithm 1 with the preconditioner 

(3.14) Tl = {I- 


This yields the trial subspace (3.6), where 


(3.15) = orth((/ - 


and, for 1 = 1,..., m. 


(3.16) 


Sf 

st 


(I - yWt/W*)T(/ _ ^ 

OTth{p2r^\l - lT(*)lTd)*),sf)). 


As before, we assume that in (3.16) and that P^Z^'^ is the orthogonal pro¬ 

jector defined in (3.10); the preconditioner T in (3.14) is chosen as an approximation 
of the shift-and-invert operator (A — al)~^. 

3.3. The harmonic Schur Rayleigh Ritz procedure. Let be a trial 
subspace of dimension s = (m -I- 2)k at iteration i defined by (3.6) with (3.8)-(3.10). 
We try to find a new orthonormal approximate Schur basis and the correspond¬ 
ing k-hy-k upper triangular matrices and such that each column of 

1 /d-i-i) belongs to and approximate the triangular factors in (2.6). 

To fulfill this task, we use a harmonic Rayleigh-Ritz projection [26, 29], adapted 
to the case of the “Q-free” Schur form (2.6). Specifically, we let = {A — aB)Z^'^'> 
be a test subspace, where we assume that the target shift a is not an eigenvalue. Then 
the approximations and can be determined by requiring each 

column of the residual of (2.6) to be orthogonal to this test subspace, i.e., 




(3.17) 
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This yields the projected problem 

(3.18) {U*AZ)YM^g^^'> = 

where Y £ G are unknown; and Z,U e contain 

orthonormal columns spanning and respectively. Once we have computed 
y, and approximations to the Schur vectors, determined by (3.17), 

are given by = ZY. 

Let us now consider the solution of the projected problem (3.18). We first show 
that the projected matrix pair {U*AZ, U*BZ) is regular. 

Proposition 3.1. Let Z^U = orth((A — aB)Z) g C"XS contain orthonormal 
basis vectors of the trial and test subspaces, and assume that a is different from any 
eigenvalue of {A, B). Then the projected pair {U*AZ, U*BZ) is regular. 

Proof. Assume, on the contrary, that {U*AZ,U*BZ) is singular. Then, by defi¬ 
nition of a singular matrix pair, regardless of the choice of cr, 

(3.19) det{U*AZ - aU*BZ) = 0. 

Since U is an orthonormal basis of col{(A — aB)Z}, we can write (A — aB)Z = UF, 
where F is an s-by-s square matrix. This matrix is nonsingular, i.e., det(P) 0, 
because A — aB is nonsingular. Hence, U = {A — aB)ZF~^, and 

U*AZ - aU*BZ = F-*Z*{A - aB)*AZ - aF-*Z*{A - aB)*BZ 
= F-*Z*{A - aB)*{A - aB)Z. 

Thus, from the above equality and (3.19), we have 

0 = det{U*AZ - aU*BZ) = det(p-*)det(y*(A - aB)*{A - aB)Z). 

Since det(P) ^ 0, this identity implies that det{Z*{A — aB)*)A — aB)Z) = 0. But the 
matrix Z*{A — aB)*{A — aB)Z is Hermitian positive semidefinite and its singularity 
implies that det(A — aB) = 0, which contradicts the assumption that a ^ A(A, B). □ 
Thus, problem (3.18) is of the form (2.6) and the matrix pair (U*AZ,U*BZ) is 
regular, whenever a is not an eigenvalue of {A, B). Therefore, (3.18) can be solved by 
the approach described in Section 2.2. 

Specifically, one first employs the standard sorted QZ algorithm [24] to com¬ 
pute the full generalized Schur form of {U*AZ,U*BZ). This will produce two uni¬ 
tary matrices YljYh £ of the left and right generalized Schur vectors, along 

with the upper triangular Ra,Rb £ ordered in such a way that the ratios 

Ri{jT j)/R 2 {jT j) of the first k diagonal elements are closest to a, and the correspond¬ 
ing Schur vectors are located at the k leading columns of Yl and Yr. Then, we 
let Yl = !/,(:, 1 :k),Ffl = iA(:,l:k) £ and obtain the desired Schur ba¬ 
sis in (3.18) as F = Yr. The triangular factors are given as = G 2 G~^Ra 

and = I — GiG~^Ra, where G = RaGi + RbG 2 , Ra = .RA(l:k,l:k) and 

Rb = .R_B(l:k,l:k), and Gi, G 2 are diagonal matrices defined by (2.11) and (2.12) 
with Ra, Rb replaced with Ra, Rb, respectively. 

As a result, the described extraction approach, which we call a harmonic Schur- 
Rayleigh-Ritz (SRR) procedure, constructs a new basis = ZY of the ap¬ 

proximate Schur vectors (the harmonic Schur-Ritz vectors) and the upper triangular 
matrices such that the ratios of their diagonal entries are approxi¬ 

mations to the desired eigenvalues. Note that in the case of a standard eigenvalue 
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problem, the proposed use of formulas (2.11) and (2.12) for evaluating the matrices 
and indeed guarantees that they converge to R and /, as Ra and Rb 

get closer to R and /, respectively. 

Although both the construction of the trial subspace and the presented harmonic 
SRR procedure aim at finding the factors V, Ma, and Mb of the “Q-free” form (2.6), 
it is easy to see that, as a by-product, the scheme can also produce approximations to 
the left Schur vectors Q and the upper triangular factors Ra-, Rb oi the conventional 
generalized Schur decomposition (2.3) if B ^ I. Specifically, the former is given by 
= UYl, whereas the latter correspond to Ra and Rb- 

Note that the computed can be conveniently exploited at iteration z -|- 1 to 

construct the new test subspace = [A— and, if B 7^ /, define the pro¬ 
jector Pq±, because represents an orthonormal basis of col{(A — aB)V^'''^^'>}. 

Proposition 3.2. Let U = orth{{A — aB)Z) G contain orthonormal 

basis vectors of the trial and test subspaces. Assume that V = ZYr, Q = UYb G 
i^nxk approximate right and left generalized Schur vectors resulting from the 

harmonic SRR procedure, along with the upper triangular factors Ra = Ra (1: k, 1: k) 
and Rb = Rb{^-^A-^)- Then, 

{A-aB)V = Q{Ra-(jRb), 

i.e., Q = UYb represents an orthonormal basis of co\{{A — aB)V}. 

Proof Since UU* is an orthogonal projector onto co\{{A — aB)Z} and V = ZYb, 
we have 


{A - (jB)V = UU*{A - aB)ZYB, = U{U*AZ)Yr - aU{U*BZ)YR. 

But {U*AZ)Yr = YrRa and {U*BZ)Yr = YrRb (resulting from the QZ factorization 
of {U*AZ,U*BZ)), which, combined with the equality Q = UYr, gives the desired 
result. □ 

Finally, we remark that another standard way to extract Schur vectors is to choose 
the test subspace to be the same as the trial subspace . This yields the stan¬ 
dard Rayleigh-Ritz type procedure. However, the drawback of this approach is that 
Schur approximations tend to be close to exterior eigenvalues and may not be success¬ 
ful for extracting the interior eigenvalues and their Schur vectors [7, 29] . Additionally, 
it does not produce approximations of the left Schur vectors that may be of 

interest and are needed to define the projector Pq-- Therefore, in our algorithmic 
developments, we rely on the systematic use of the harmonic SRR procedure. 

3.4. Augmenting the trial subspace. At every iteration i, the construction 
of the trial subspace Z^'^'> starts with the current approximate Schur basis , which 
is used to determine the remaining blocks in (3.6). A natural question is whether it is 
possible to further enhance (3.6) if an additional information is available at the present 
stage of computation, without incurring any significant extra calculations. Namely, 
we are interested in defining a block P^U of additional search directions, such that 
the (m -|- 3)fc-dimensional subspace 

(3.20) = co1{h(*) ^, ■ • •, P^"^} 

contains better approximations to the desired Schur basis than (3.6). 

When solving Hermitian eigenvalue problems, a common technique to accelerate 
the eigensolver’s convergence is to utilize an approximate solution computed at the 
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previous step. The same idea can be readily applied to the non-Hermitian case. For 
example, given approximate right generalized Schur vectors from the previous 

iteration, we can use them to expand the trial subspace (3.6) by setting to 
or, in a LOBPCG fashion, to a combination of and V^''\ such that 

(3.21) 

where Cy is an appropriate matrix [19, 20]. Clearly, in exact arithmetic, both 
options lead to the same subspace (3.20). 

Unlike the case of Hermitian eigenvalue problems, where utilizing approximation 
from the previous step leads to a significant acceleration of the eigensolver’s conver¬ 
gence, our numerical experience suggests a different picture for non-Hermitian prob¬ 
lems. While in many cases the use of the additional search directions based on 
indeed lead to a noticeable improvement in convergence, we also observe situations in 
which no or only minor benefit can be seen. 

A better approach for augmenting (3.6) is to define as a block of k extra 
approximate Schur vectors. Specifically, if Fr contains the full (ordered) right Schur 
basis of the projected pair {U* AZ,U*BZ), where U and Z are orthonormal bases of 
the test and trial subspaces and respectively, then we let 

(3.22) P« = V^2i:2k = ^iA(:,k+l:2k), 

where Yr( :, k+1; 2k) denotes a submatrix of Fr lying in the columns k + 1 through 2k. 
As we demonstrate numerically in Section 5, this choice generally leads to a better 
augmented subspace (3.20), significantly accelerating and stabilizing the convergence. 
Therefore, we use it by default in our implementation. 

The use of formulation (3.22) can be viewed as a “thick restart” [25, 40, 42, 47], 
which retains more (harmonic) Ritz approximations than needed after collapsing the 
trial subspace. In particular, it suggests that GPLHR should transfer 2k approximate 
Schur vectors from one iterations to another. These Schur vectors correspond to the 2k 
approximate eigenvalues closest to tr. The k leading approximations then constitute 
the block whereas the k remaining ones are placed in P^®). 

3.5. The GPLHR algorithm. We are now ready to combine the develop¬ 
ments of the previous sections and introduce the Generalized Preconditioned Loeally 
Harmonie Residual (GPLHR) algorithm. 

Starting with an initial guess at each iteration f, GPLHR constructs a 

trial subspace (3.20) using the preconditioned Krylov-Arnoldi sequence (3.8)-(3.10) 
if P 7 ^ /, or (3.15)-(3.16) otherwise, and performs the harmonic SRR procedure to 
extract an updated set of Schur vector approximations with the corresponding upper 
triangular factors. The detailed description is summarized in Algorithm 2. 

An attractive feature of Algorithm 2 is that it is based only on block operations, 
which enables efficient level-3 BLAS routines for basic dense linear algebra, as well as 
utilization of sparse matrix times block (matblock) kernels. The method also provides 
a unified treatment of standard and generalized eigenproblems. In the latter case, it 
also allows handling infinite eigenvalues if the chosen shift a is sufficiently large. 

Some caution should be exercised when choosing the value of the shift tr. In par¬ 
ticular, it is expected that cr is different from any eigenvalue A of (1.1). While the shift 
is unlikely to be an eigenvalue, it is generally desirable that a is not very close to A, 
which can potentially lead to ill-conditioning of the SRR procedure and of the precon¬ 
ditioner T K, {A — aB)~^. An optimal choice of a is outside the scope of this paper. 
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Algorithm 2: The GPLHR algorithm 

Input: A regular pair {A, B) of n-hy-n matrices, shift cr £ C different from 

any eigenvalue of {A,B), preconditioner T, starting guess of the Schur 
vectors and the subspace expansion parameter m. 

Output: If 7 ^ 7, then approximate generalized Schur vectors V,Q £ and 

the associated upper triangular matrices Ra,Rb € in (2.3), such 

that Xj = RaH, j)/R bH, j) are the k eigenvalues of {A,B) closest to a. 

If 73 = 7, then the Schur vectors V £ and the associated triangular 

matrix R £ in (3.11), such that Xj = R{j,j) are the k eigenvalues 

of A closest to a. 

1: V orth(i/(°)); Q £- orth((A - aB)V)-, P ^ [ ]; 

2: [Ra,Rb,Yl,Yr] £- ordqz(Q*AU,Q*PU,cr)“; V £- VYr; Q £- QYl\ 

3: Set fc-by-A: diagonal matrices Gi and G 2 by (2.11) and (2.12); G £- RaGi + RbG 2 - 
4: Ma G2G-^Ra\ Mb^I- GiG-^Ra\ 

5: while convergence not reached do 
6 : if 73 7 ^ 7 then 

7: Wa^ AV-QRa;Wb ^ BV-QRb-, 

8 : W^{I-VV*)T{I-QQ*){WaMb-WbMa); 

9: else 

10: W ^ {I-VV*)T{I-VV*){AVMb-VMa)-, 

11 : end if 

12: IT ^ orth(lT); So ^ VP; 5^ [ ]; 

13: for 1 = 1 —>■ m do 

14: if 73 7 ^ 7 then 

15: VV*)T{I - QQ*){ASi-iMb - BSi-iMa)-, 

16: else 

17: Si ^ (7 - VV*)T{I - VV*){ASi-iMb - Si-^Ma)', 

18: end if 

19: Si ^ Si - lP(VP*Si); Si ^ Si - S(S*Si); S, £- orth(Si); S ^ [S Si]; 

20 : end for 

21 : P ^ P- V{v*py, P ^ P - W{W*P)-, P ^ P- S{S*P)-, P £- orth(P); 

22: Set the trial subspace Z [T, W, Si,..., Sm, P]', 

23: Q ^ orth((A — (t73)[VP, Si, ..., Sm, P])’, QQ — Q{Q*Q)', 

24: Set the test subspace U £- [Q,Q]', [Ra, Rb,Yl,Yr] <— ordqz{U*AZ,U*BZ, a)-, 

25: Yr £- yR(: ,l:k), Yr £- Yt(: ,l:k), Ra £- 7?a( 1 :k. 1 :k), Rr £- 7?s(l:k, 1 :k); 

26: p ^ WYw+SiYs,+.. .+SmYs^+PYp, where Yr = [Y^, Y^, Yg, ..., Yl^,Yjf 

is a conforming partitioning of Yr; 

27: V ^ VYv + P-, Q ^ UYr-, 

28: ZYr(: ,k+l :2k));'’ 

29: Set fc-by-A: diagonal matrices Gi and G 2 by (2.11) and (2.12); G •<— RaGi TPrGo; 

30: Ma £- G2G-^Ra; Mr ^ I - GiG’^'Pa; 

31: end while 

32: If P = 7, then R <— Ma- 


°'[Ra, Rb, Yr, Yr] <— ordqz(4?, ik, cr) computes the full generalized Schur decomposition for an 
input pair (<I>, 4/), ordered to ensure that |7J.4(j, i)/7?R(i, i) — u\ < \RA{j, j)/R b U, j)~rr\ for i < j. 
'"This step can be disabled if the LOBPGG-style search direction of step 26 is preferred. 


In the case of a generalized eigenproblem, a possible way to determine convergence 
is based on assessing column norms of the blocks Wa and Wb in step 7. These blocks 
represent natural residuals of the generalized Schur form (2.3). Hence, the conver- 
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gence of j initial columns vi,... ,Vj and qi,... ,qj in V and Q can be declared if 
.j)|P + I|W^b( - .j)lP is sufficiently small. If i? = /, a similar approach, based 
on examining the norms of the column of the residual AVMb — VMa, can be used. 

One can observe that the block W, formed in step 8 of the algorithm, equals 
(/ — VV*)T{I — QQ*){AVMb — BVMa), because by definition of Wa and Wb, 

(/ - QQ*){WaMb - WbMa) = (/ - QQ*)i{AV - QRa)Mb - {BV - QRb)Ma) 

= {I- QQ*){AVMb - BVMa). 

Thus, W is indeed the projected preconditioned residual of problem (2.6). 

A common feature of block iterations is that some approximations can converge 
faster than others. If some columns vi,... ,Vj and qi,... ,qj in V and Q have con¬ 
verged, they can be “locked” using the “soft locking” strategy (see, e.g., [20]), the 
same way as done for Hermitian eigenproblems. With this approach, in subsequent 
iterations, one retains the converged vectors in V and Q, but removes the correspond¬ 
ing columns from the blocks W,Si,..., Sm, and P. Note that locking of the Schur 
vectors in Algorithm 2 should always be performed in order, i.e., Vj+i can be locked 
only after the preceding Schur vectors vi,... ,Vj have converged and been locked. 

The use of “locking” can have a strong impact on the convergence behavior of 
the method, especially when a large number of vectors have converged. As locking 
proceeds, an increasing number of columns of W,Si,..., Sm, and P is removed, 
leading to a shrinkage of the trial subspace. As a result, the trial subspaces can become 
excessively small, hindering the convergence rate and robustness. In contrast to the 
Hermitian eigenproblems, the effect of the subspace reduction can be significantly 
more pronounced in the non-Hermitian case, which generally requires searching in 
subspaces of a larger dimension. 

In order to overcome this problem, we propose to automatically adjust the pa¬ 
rameter m. In our implementation, we increase m to the largest integer such that 
m{k — q) < mok, where mo is the initial value of m and q is the number of converged 
eigenpairs. Specifically, we recalculate m as m ^ min{[mofc/(fc — g)J,20}. 

The GPLHR algorithm can be implemented with (m -|- 1) multiplications of the 
matrix A, and the same number of multiplications of B, by a block of k vectors per 
iteration. The allocated memory should be sufficient to store the basis of the trial 
and test subspaces Z and U, as well as the blocks AZ and HZ, whose size depends 
on the parameter m, i.e., s = (m -I- 2)k for (3.6), or s = (m -I- 3)fc for the augmented 
subspace (3.20). Thus, to decrease computational cost and storage, the value of m 
should be chosen as small as possible, but, on the other hand, should be large enough 
not to delay a rapid and stable convergence. For example, in most of our numerical 
experiments, we use m = 1. 

Note that step 28 of Algorithm 2 is optional. It provides means to switch between 
the LOBPCG-style search directions (3.21) and those defined by (3.22). We use the 
latter by default. 

Finally, we emphasize that Algorithm 2 is capable of computing larger numbers 
of Schur vectors incrementally, using deflation, similar to Hermitian eigenproblems. 
That is, if V and Q are the computed bases of k right and left generalized Schur 
vectors, the additional k generalized Schur vectors can be computed by applying 
GPLHR to the deflated pair {{I - QQ*)A{I -VV*),{I - QQ*)B{I -VV*)) AB^I, 
or to the matrix (/ — VV*)A{I — VV*) otherwise. This process can be repeated 
until the desired number of vectors is obtained. The ability to facilitate the deflation 
mechanism is an additional advantage of the generalized Schur formulations (2.3) 
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and (2.6), as opposed to eigendecomposition (2.2) for which the deflation procedure 
is not directly applicable. 

3.6. Alternative formulas for residual computation. As has been pointed 

out in Section 2.2, the “Q-free” generalized Schur form (2.6) is not unique. Therefore, 
the computation of the residual in (3.8) can, in principle, be based on a different 
formula rather than the expression with given 

by (2.11), (2.12) and (2.8). For example, if either (or both) of the upper triangular 
factors Ra,Rbi resulting from the harmonic projection, are nonsingular at all itera¬ 
tions, which is the most common case in the majority of practical computations, then 
the residual can be formulated as AV^'‘\R']^Rb) — or AV^^'> — BV^'^^R^^Ra)- 

All these formulas are asymptotically equivalent, in the sense that the correspond¬ 
ingly defined residuals vanish as converges to the desired right Schur vectors. One 
would naturally question if their choice could affect the performance of GPLHR before 
it reaches the asymptotic mode. Our extensive numerical experience indicates that 
the effect of the choice of the residual formulas is very mild. Other factors, e.g., the 
quality of the preconditioner, the subspace dimension parameter m, appropriate use 
of projectors Py± and Pq± for developing the trial subspace, and the choice of the 
additional search directions all have much stronger impact on the behavior of 
GPLHR. Therefore, we always construct the residual as in (3.8), which is well-defined 
for all regular pencil {A,B). The same considerations apply to the S'-vectors (3.9). 

3.7. The GPLHR algorithm for partial eigenvalue decomposition. If 

eigenvalues of {A,B) are non-deficient and the targeted eigenvectors X are known to 
be reasonably well-conditioned, then one would prefer to tackle the partial eigenvalue 
decomposition (2.2) directly, without resorting to the Schur form computations. Such 
situations, in particular, can arise when the non-Hermitian problem is obtained as a 
result of some perturbation of a Hermitian eigenproblem, e.g., as in the context of 
the EOM-CC [14, 15] or complex scaling methods [3, 16, 33] in quantum chemistry. 
Additionally, eigenvectors often carry certain physical information and it can be of 
interest to track evolution of their approximations without extra transformations. 

It is straightforward to apply the described framework to devise an eigenvector- 
based algorithm for computing the partial decomposition (2.2). In particular, using 
exactly the same orthogonal correction argument, we can obtain a version of GPLHR 
that operates on the approximate eigenvectors rather than Schur vectors. This scheme, 
which we refer to as GPLHR-EIG, is stated in Algorithm 3 of Appendix A. 

4. Preconditioning. The choice of the preconditioner is crucial for the overall 
performance of the GPLHR methods. While the discussion of particular precondi¬ 
tioning options is outside the scope of this paper, below, we briefly address several 
general points related to a proper setting of the preconditioning procedure. 

An attractive feature of the GPLHR algorithms is that the definition of an ap¬ 
propriate preconditioner T is very broad. We only require T to be a form of an 
approximate inverse of (ctbA — oaB). Hence, T can be constructed using any avail¬ 
able preconditioning technique that makes the solution of the shifted linear systems 
{(JbA — aAB)w = r easier to obtain. For a survey of various preconditioning strate¬ 
gies we refer the reader to [4, 35]. Note that, in practice, we expect T « (A — (jB)~^ 
if eigenvalues of interest are finite, and T « Rl if infinite eigenvalues are wanted. 
For example, in the latter case, one can let T k, [B + where r is a small 

regularization parameter. 


THE GPLHR METHOD FOR NON-HERMITIAN EIGENPROBLEMS 


17 


If T is not sufficient to ensure the eigensolver’s convergence with a small search 
subspace, it can be used as a preconditioner for an approximate iterative solve of 
{(TbA — aAB)w = r, e.g., using the GMRES algorithm [37] or IDR(s), a popular 
short-term recurrence Krylov subspace method for nonsymmetric systems [41]. To 
enhance the performance by fully utilizing BLAS-3 operations, block linear solvers 
should be used whenever available. Then the preconditioned linear solver itself can 
be used in place of the operator T to precondition the GPLHR algorithm. Thus, by 
setting the number of iterations of the linear solver or by adjusting its convergence 
threshold, one obtains a flexible framework for “refining” the preconditioning quality, 
which is especially helpful for computing interior eigenvalues. 

Finally, if T admits a fast computation, it can be periodically updated in the 
course of iterations, in such a way that the shift in the preconditioning operator 
is adjusted according to the current eigenvalue approximations. For example, this 
strategy was adopted in the GPLHR implementation [50], where T was diagonal. 

5. Numerical experiments. The goal of this section is two-fold. First, we 
would like to demonstrate numerically the significance of several algorithmic compo¬ 
nents of GPLHR, such as the use of additional search directions pA) or the application 
of a projector prior to preconditioning with T. Secondly, we are interested in compar¬ 
ing GPLHR with a number of existing state-of-the-art eigensolvers for non-Hermitian 
eigenvalue problems, which includes the block GD (BGD), the JDQR/JDQZ algo¬ 
rithm, and the implicitly restarted Arnold! method available in ARPAGK [23]. 


Problem 

Type 

n 

CT 

Preconditioner 

Spectr. region 

A15428 

stand. 

15428 

-40.871 

ILU(10-^) 

interior 

AF23560 

stand. 

23560 

-40 -b 300i 

ILU(10~^) 

largest mod. 

CRYIOOOO 

stand. 

10000 

8.0 

ILU(10-^) 

largest Re 

DW8192 

stand. 

8192 

1.0 

ILU(10-^) 

rightmost 

LSTAB NS 

gen. 

12619 

0 

GMRES(25)+LSC 

rightmost 

MHD4800 

gen. 

4800 

-0.1 + 0.5i 

(A-aB)-^ 

interior 

PDE2961 

stand. 

2961 

10.0 

ILU(10-^) 

largest Re 

QH882 

stand. 

882 

-150+ 180i 

GMRES(5)+ILU(10”'’) 

interior 

RDB3200L 

stand. 

3200 

2i 

ILU(10-^) 

rightmost 

UTM1700 

gen. 

1700 

0 

GMRES(15)+ILU(10-^) 

leftmost 


Table 5.1: Test problems. 


Table 5.1 summarizes the test problems considered throughout this section. For 
all problems, the shifts a are chosen to have physical meaning, i.e., they target the 
part of spectrum that is normally sought for in practical applications. This targeted 
region of the spectrum is stated in the rightmost column of the table. 

The problems in Table 5.1 come from a number of different applications. For ex¬ 
ample, the complex symmetric matrix A15428 arises from the resonant state calcula¬ 
tion in quantum chemistry [3, 16, 33]. Problem LSTAB_NS is delivered by the linear 
stability analysis of a model of two dimensional incompressible fluid flow over a back¬ 
ward facing step, constructed using the IFISS software package [12]. The domain is 
[—1, L] X [—1,1] with [—1, 0] X [—1, 0] cut out, where L = 10; the Reynolds number is 
300. A biquadratic/bilinear {Q2-Q1) finite element discretization with element width 
1/8 is used. More details on this example can be found in [49]. 
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The rest of the test matrices in Table 5.1 are obtained from Matrix Market^. 
The three larger problems AF23560, CRYIOOOO, and DW8192 arise from the stabil¬ 
ity analysis of the Navier-Stokes equation, crystal growth simulation, and integrated 
circuit design, respectively. MHD4800 is a generalized eigenproblem given by an ap¬ 
plication in magnetohydrodynamics, where interior eigenvalues forming the so-called 
Alfven branch are wanted. Another interior eigenvalue problem is QH882, coming 
from the power systems modeling. The matrices PDE2961 and RDB3200L are ob¬ 
tained from a 2D variable-coefficient linear elliptic partial differential equation and a 
reaction-diffusion Brusselator model in chemical engineering, respectively. UTM1700 
corresponds to a generalized eigenproblem from nuclear physics (plasmas). All of our 
tests are performed in Matlab. 

Throughout, unless otherwise is explicitly stated, the value of the trial subspace 
size parameter m is set to 1. As a preconditioner T we use triangular solves with in¬ 
complete L and U factors computed by the Matlab’s ilu routine with thresholding 
(denoted by ILU(t), where t is the threshold value). If a more efficient preconditioner 
is needed then T corresponds to several steps of the ILU-preconditioned GMRES, fur¬ 
ther referred to as GMRES(s)-l-ILU(t), where s is the number of GMRES steps. The 
only exception is LSTAB_NS, where the action of T is achieved by GMRES(25) pre¬ 
conditioned by an ideal version of the least squares commutator (ESC) preconditioner 
[13]. The default preconditioners for all test problems are also listed in Table 5.1. 


Problem 

•^thick 

-^LOBPCG 

W/o P'*) 

A15428 

12 

14 

16 

AF23560 

10 

DNC 

153 

CRYIOOOO 

27 

DNC 

DNC 

DW8192 

118 

55 

329 

LSTAB NS 

38 

DNC 

DNC 

MHD4800 

137 

DNC 

DNC 

PDE2961 

13 

DNC 

DNC 

QH882 

15 

DNC 

30 

RDB3200L 

10 

12 

12 

UTM1700 

16 

23 

21 


Table 5.2: Numbers of iterations performed by GPLHR to compute k = 5 eigenpairs with 
different choices of the search directions . 


5.1. Effects of the additional search directions. In Table 5.2, we report 
numbers of iterations performed by the GPLHR methods to compute five eigenpairs 
with different choices of the search directions In particular, we compare the 

default “thick-restarted” option (3.22) with the LOBPGG style choice (3.21), and the 
variant without which corresponds to the non-augmented subspace (3.6). 

As has already been mentioned, in practice, the quality of approximations pro¬ 
duced by GPLHR should be assessed through norms of the Schur residuals. However, 
in order to facilitate comparisons with other methods, at each step, we transform 
the current approximate Schur basis into eigenvector approximations and de¬ 
clare convergence of a given pair (A, x) using its relative eigenresidual norm. That is, 
we consider (A, a;) converged, and soft-lock it from further computations, if jjAa; — 

^http://math.nist.gov/MatrixMarket/ 
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Ai3a;||/||Aa;|| < 10“®. Note that, in this case, locking is performed in a contiguous 
fashion, i.e., eigenvector Xj+i can be locked only after the preceding vectors Xi,... ,Xj 
have converged and been locked. 

Table 5.2 clearly demonstrates that the choice = 14 +i. 2 fcj denoted by PtMck, 
generally leads to the fastest and most robust convergence behavior of GPLHR. Such 
a construction of the search directions requires a negligibly low cost, without extra 
matrix-vector products or preconditioning operations. However, their presence in the 
trial subspace indeed significantly accelerates and stabilizes the convergence, as can 
be observed by comparing the corresponding iteration counts with the case where the 
block is not appended to the subspace (3.22). Here, “DNC” denotes cases where 
the method failed to reduce the relative eigenresidual norm below the tolerance level 
of 10“® for all eigenpairs within 500 iterations. 

Furthermore, it is evident from Table 5.2 that the choice Pth/ck generally outper¬ 
forms the LOBPCG style formula (3.21), denoted by PloWcg- Specifically, the former 
always leads to convergence and yields lower iteration counts for all of the test prob¬ 
lems, except for DW8192. In our experience, however, this situation is not common, 
and the use of (3.22) typically results in a much more robust convergence behavior. 

5.2. Expansion of the trial subspace. The size of the GPLHR trial subspace 
is controlled by the parameter m. Generally, increasing m leads to a smaller iteration 
count. However, since the expansion of the GPLHR trial subspace requires extra 
matrix-vector multiplications and preconditioning operations, using larger values of 
m also increases the cost of each GPLHR step. Therefore, the parameter m should 
be chosen large enough to ensure stable convergence, but at the same time should be 
sufficiently small to maintain a relatively low cost of iterations. 


Number of iterations for different vaiues of m Number of preconditioned matvecs for different vaiues of m 



m 


m 


Fig. 5.1: Numbers of iterations (left) and preconditioned matrix-vector products (right) 
performed by GPLHR, preconditioned with ILU(O.OOl) and ILU(0.5), to compute fc = 10 
eigenpairs of AF23560 with different values of the subspace parameter m. 


In most cases, using larger values of m is counterproductive, as can be seen 
from Figure 5.1. The figure shows how the number of iterations and preconditioned 
matrix-vector products change as m grows. If a good preconditioner is at hand, such 
as ILU(O.OOl) for AF23560 in the figure, then it is common that the iteration count 
stabilizes at m = 1 or m = 2 and therefore further increase of the parameter is not 
needed as it leads to redundant computations. However, increasing m can be more 
effective when less efficient preconditioners are used, e.g., ILU(0.5) in our test. In 
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this case, a relatively low preconditioning quality is compensated by a larger trial 
subspace. In particular, in the AF23560 example with ILU(0.5), the optimal value of 
m in terms of computational work is 4. Here, again, the convergence tolerance, with 
respect to the relative eigenresidual norms, is set to 10“®. 

5.3. Effects of projectors. Our derivation of the GPLHR method in Section 3 
suggested that the preconditioner T should be preceded by application of the projector 
(I — (or (/ —ii B = /), and then followed by the projection {I — 

(see (3.7) and (3.14)). This motivated the occurrence of the corresponding 
projectors on both sides of T at steps 8 and 15 (if B ^ I) and at steps 10 and 17 (if 
B — I) oi Algorithm 2. 

While the presence of the left projector J—1/(0 ]/(*)* can be viewed as a part of the 
orthogonalization procedure on the trial subspace, and is thus natural, it is of interest 
to see whether the action of the right projection, performed before the application 
of T, has any effect on the GPLHR convergence. For this reason, we compare the 
GPLHR scheme in Algorithm 2 to its variant where the right projector is omitted. 
The corresponding iteration counts are reported in Table 5.3. 


Problem 

With proj. 

W/o proj. 

A15428 

14 

DNC 

AF23560 

9 

9 

CRYIOOOO 

13 

17 

DW8192 

44 

45 

LSTAB_NS 

49 

84 

MHD4800 

12 

12 

PDE2961 

10 

11 

QH882 

47 

43 

RDB3200L 

12 

21 

UTM1700 

25 

26 


Table 5.3: Numbers of iterations performed by GPLHR with and without the projection at 
the preconditioning step to compute A: = 10 eigenpairs. 


It can be observed from Table 5.3 that, in most of the tests, applying the projector 
before applying applying T gives a smaller number of iterations. In several cases, the 
difference in the iteration count is substantial; see the results for A15428, LS_STAB, 
and RDB3200L. The only example, where the absence of the projector yields slightly 
less iterations is QH882. However, this only happens for a small value of the parameter 
TO, and the effects of the projection become apparent and favorable as to is increased 
and the convergence is stabilized. 

5.4. Comparison with BGD. We now compare GPLHR to the classic BGD 
method used to solve non-Hermitian problems [27, 38]. This eigensolver is popular, 
in particular, in quantum chemistry; e.g., [50]. For a fair comparison, in BGD, we 
apply the harmonic Rayleigh-Ritz procedure [29] to extract approximate eigenvectors 
and use the same projected preconditioner (3.7) or (3.14), depending on whether the 
eigenproblem is generalized or standard, respectively. In fact, once equipped with 
the projected preconditioning, BGD can be viewed as a generalization of the JD 
methods [7, 40] to the block case. 

To ensure that both schemes have the same memory constraint, we consider a 
restarted variant of BGD, where the search subspace is collapsed after reaching the 
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dimension of (m + 3)/c, with the k current approximate eigenvectors (the harmonic 
Ritz vectors) used to start the next cycle. We denote this scheme as BGD((to + 3)fc). 
In particular, if to = 1, we get BGD(4fc). 

In principle, it is hard to compare preconditioned eigensolvers without knowledge 
of a specific preconditioner used within the given application. Therefore, our first 
experiment, reported in Figure 5.2, attempts to assess the methods under different 
preconditioning quality, where T ranges from a fairly crude approximation of {A — 
aB)~^ to a more accurate approximation. 


A15428, k = 10 


UTM1700, k = 5 




Fig. 5.2: Comparison of GPLHR (m = 1) and BGD(4fc) with different preconditioning 
quality for computing several eigenpairs of A15428 (left) and UTM1700 (right) closest to cr. 
The preconditioner T corresponds to an application of the ILU-preconditioned GMRES with 
stopping tolerance toLprec. 


In order to emulate a variety of preconditioners, we set T to an approximate 
solve of {A — aB)w = r using the (full) preconditioned GMRES. The quality of T can 
then be adjusted by varying the GMRES convergence tolerance, denoted by toLprec. 
Glearly, lower values toLprec correspond to a more effective preconditioner, whereas 
increasing toLprec leads to its deterioration. 

Figure 5.2 shows the numbers of (preconditioned) matrix-vector products for dif¬ 
ferent levels of preconditioning quality. Here, GPLHR and BGD(4A:) are applied for 
computing several eigenpairs of the problems A15428 (left) and UTM1700 (right). The 
reported preconditioned matrix-vector product (matvec) counts include only those 
performed by the eigensolvers, without the ones from the “inner” GMRES itera¬ 
tions, which are hidden in the application of T. As a GMRES preconditioner we use 
ILU(I0-2) for A15428 and ILU(10-4) for UTM1700. 

The results in Figure 5.2 are representative. They demonstrate that, under the 
same memory constraint, GPLHR tends to exhibit a better performance if the precon¬ 
ditioner is either reasonably strong (i.e., toLprec< 10“^ in both plots) or is relatively 
weak (i.e., toLprec around 10“^ on the left plot and around 10“^ on the right). At the 
same time, one can see that BGD(4fc) can outperform GPLHR for an “intermediate” 
preconditioning quality, i.e., corresponding to toLprec between 10“^ and 10“^ in the 
left, and between 10“^ and 0.009 on the right, plot of the figure. 

In Table 5.4, we compare the numbers of iterations (#it) and (preconditioned) 
matvecs (#pmv) performed by GPLHR and BGD(4fc) for all of our test problems 
with the default preconditioners listed in Table 5.1. Here, the matvec counts also 
include those accrued at “inner” GMRES iterations within the preconditioning step. 
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GPLHR 

BGD(4fc) 

Problem 

k 

#it 

#pmv 

#it 

#pmv 

A15428 

3 

17 

98 

DNG 

DNG 

5 

11 

100 

DNG 

DNG 

10 

14 

239 

41 

220 

AF23560 

3 

10 

52 

DNG 

DNG 

5 

10 

84 

48 

118 

10 

9 

156 

39 

232 

CRYIOOOO 

3 

29 

146 

142 

237 

5 

24 

192 

DNG 

DNG 

10 

14 

245 

DNG 

DNG 

DW8192 

3 

70 

377 

DNG 

DNG 

5 

51 

474 

363 

1447 

10 

48 

799 

450 

1808 

LSTAB_NS 

3 

51 

5616 

306 

8843 

5 

37 

6552 

DNG 

DNG 

10 

41 

15184 

DNG 

DNG 

MHD4800 

3 

30 

158 

DNG 

DNG 

5 

170 

1062 

DNG 

DNG 

10 

12 

206 

79 

432 

PDE2961 

3 

12 

70 

53 

123 

5 

11 

98 

46 

114 

10 

11 

178 

50 

207 

QH882 

3 

18 

588 

23 

348 

5 

14 

708 

22 

450 

10 

52 

4332 

92 

2190 

RDB3200L 

3 

8 

46 

16 

41 

5 

9 

80 

21 

76 

10 

11 

180 

33 

163 

UTM1700 

3 

17 

1312 

68 

1760 

5 

16 

2080 

44 

2208 

10 

25 

6048 

125 

6704 


Table 5.4: The numbers of iterations and preconditioned matrix-vector products performed 
by GPLHR (m = 1) and BGD(4fe). 


It can be seen from Table 5.4 that GPLHR is generally more robust than BGD(4fc). 
The former was always able to attain the requested level of the solution accuracy, 
whereas the latter sometimes failed to converge. Furthermore, one can see that in 
the examples where BGD(4A:) is successful, GPLHR typically performs fewer precon¬ 
ditioned matvecs and, hence, is more efficient; with the only exceptions being the 
QH882 matrix and, to a lesser extent, RDB3200L. 

In contrast to GPLHR, BGD can perform the approximate eigenvector extrac¬ 
tion from a trial subspace that expands throughout the iterations, while preserving 
the same number of preconditioned matrix-vector products per step. Therefore, the 
BGD convergence can be accelerated if additional storage is available. However, the 
memory usage increase, sufficient to reproduce the GPLHR performance, can often 
be significant in practice and thus prohibitive for problems of a very large size. 

This point is demonstrated in Figure 5.3, which shows that BGD can require 
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AF23560, k=10 CRY10000, k=5 




LSTAB_NS, k=7 




Fig. 5.3: Comparison of GPLHR (m = 1) and BGD(f) with different values of the restart 
parameter t. 


around two to three times more storage than GPLHR to achieve a comparable con¬ 
vergence rate. Thus, the GPLHR schemes can be preferred to BGD in cases where 
memory is tight. For all the test problems in this example we use the default precon¬ 
ditioners listed in Table 5.1. 

5.5. Comparison with JDQR/JDQZ. As has been mentioned in introduc¬ 
tion, the GPLHR method is substantially different from JDQR/JDQZ. The former is 
based on block iterations, whereas the latter are single vector schemes which compute 
one eigenpair after another in a sequential fashion. The goal of our experiments below 
is to point out the advantages of using the block GPLHR iterations for practical large- 
scale eigenvalue computations. In our comparisons, we use the jdqr.m and jdqz.m 
MATLAB implementations of the JDQR/JDQZ algorithms^. 

As a GPLHR preconditioner T we employ a fixed number of iterations of precon¬ 
ditioned GMRES (itc) for the system {A — aB)w = r, where r is a vector to which the 
preconditioner T is applied. Throughout, we use the standard single-vector GMRES 
to solve systems corresponding to different right-hand sides separately. However, in 
practice, it may be more efficient to use block linear solvers for a simultaneous solution 
of all systems. In the JDQR/JDQZ context, the preconditioner is applied through 
solving the correction equation. Therefore, in order to match the complexity and qual¬ 
ity of the GPLHR preconditioning to the correction equation solve, the solution of the 


^http://www.staff.science.uu.nl/-sleij101/ 
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latter is approximated by applying Hq steps of GMRES with the same preconditioner. 
The number Uq is set to the smallest value that yields the convergence of GPLHR. 

In our tests, we use the harmonic Rayleigh-Ritz option for eigenvector extraction 
in JDQR/JDQZ. To ensure the same memory constraint for both methods, we set 
the maximum size of the JDQR/JDQZ search subspace to 4fc (the parameter m in 
GPLHR is set to 1 in all of the runs). After restarting, the JDQR/JDQZ retains k + 5 
harmonic Ritz vectors, which is a default value in the jdqr.m and jdqz.m. 

In order to have a similar locking of the converged Schur vectors, we modify the 
GPLHR stopping criterion to be the same as in jdqr.m and jdqz.m. Namely, we con¬ 
sider the right Schur vector vj and an approximate eigenvalue Xj to be converged only 
if the preceding vectors V = [vi,V 2 , ■ ■ ■ ,Vj-i] have converged and the norm of the 
eigenresidual of the deflated problem {{I - QQ*)A{I-VV*), (I - QQ*)B{I-VV*)), 
evaluated at {Xj,Vj), is below the given tolerance level, set to 10“® in our tests. Here, 
the matrix Q = [gi, < 72 ,.. ., Qj-i] contains the converged left Schur vectors, and Q = V, 
a B = I. 



GPLHR 

JDQR 

Problem 

k 

ita 

#it 

#mv 

#prec 

#it 

#mv 

#prec 

A15428 

1 

15 

13 

417 

416 

20 

321 

341 


3 

10 

16 

949 

946 

45 

502 

541 


5 

10 

14 

1347 

1342 

57 

642 

685 

AF23560 

1 

0 

13 

27 

26 

DNC 

DNC 

DNC 


3 

0 

14 

75 

72 

50 

57 

101 


5 

0 

14 

117 

112 

69 

84 

139 


10 

0 

12 

214 

204 

112 

147 

225 

CRYIOOOO 

1 

3 

46 

369 

368 

62 

249 

311 


3 

3 

58 

1243 

1240 

182 

735 

911 


5 

4 

45 

1915 

1910 

218 

1105 

1309 


10 

3 

42 

3102 

3092 

DNC 

DNC 

DNC 

DW8192 

1 

0 

175 

351 

350 

659 

660 

1318 


3 

0 

81 

423 

420 

274 

281 

549 


5 

0 

59 

535 

530 

270 

258 

541 


10 

0 

49 

831 

821 

336 

371 

673 

PDE2961 

1 

7 

48 

769 

768 

DNC 

DNC 

DNC 


3 

0 

20 

117 

114 

48 

55 

97 


5 

0 

17 

149 

144 

57 

72 

115 


10 

0 

15 

242 

232 

86 

121 

173 

QH882 

1 

5 

24 

289 

288 

26 

157 

183 


3 

5 

24 

783 

780 

DNC 

DNC 

DNC 


5 

5 

17 

893 

888 

41 

261 

288 


10 

5 

50 

4294 

4284 

DNC 

DNC 

DNC 

RDB3200L 

1 

0 

31 

63 

62 

54 

55 

109 


3 

0 

20 

121 

118 

71 

78 

143 


5 

0 

22 

213 

208 

102 

117 

205 


10 

0 

20 

394 

384 

183 

218 

367 


Table 5.5: Comparison of GPLHR and JDQR algorithms for standard eigenproblems. 


In Table 5.5, we report results for standard eigenproblems, where GPLHR is 
compared with the JDQR algorithm. The table contains numbers of eigensolvers’ 
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iterations (#it), matrix-vector products (#mv), and applications of the preconditioner 
(#prec) used within GMRES solves. In all runs, we apply Uq GMRES steps with 
ILU(10“^) as the GPLHR preconditioner T and as the JD correction equation solve, 
except for the QH882 problem where a stronger ILU(10“®) preconditioner is used 
within GMRES. Note that the zero number Uq means that T is applied through a 
single shot of the ILU preconditioner, without the “inner” GMRES steps. 

One can see from Table 5.5 that GPLHR generally performs significantly less it¬ 
erations than JDQR. At the same time, each GPLHR iteration is more expensive, re¬ 
quiring (m-l-l)fc matrix-vector products and preconditioning operations, while JDQR 
only performs a single matrix-vector multiply plus the correction equation solve per 
step. As a result, the total number of matrix-vector products and preconditionings 
is, in most cases, larger for GPLHR, especially if larger numbers k of eigenpairs are 
wanted. However, generally, the increase is mild, typically around a factor of 2. On 
the other hand, GPLHR allows for a possibility to group matrix-vector products into 
a single matrix-block multiply in actual parallel implementation, which can lead to 
a 2 to 3 times speedup, e.g., [1], compared to separate multiplications involving sin¬ 
gle vectors, as in JDQR. Therefore, we expect that GPLHR schemes will ultimately 
outperform the JDQR/JDQZ family for large-scale eigenproblems on modern parallel 
computers. Optimal parallel implementations of GPLHR are, however, outside the 
scope of this paper and will be subject of further study. 

Note that, if ita = 0, JDQR often produces a larger #prec count, i.e., it requires 
more ILU applications compared to GPLHR. This is because of the way the correction 
equation is solved in JDQR/JDQZ [7, Section 2.6], which requires an extra application 
of the preconditioner to form the correction equation. Thus, one can expect that 
GPLHR gives a faster time to solution in cases where matrix-vector products are 
relatively cheap, but preconditioning is expensive. 



GPLHR 

JDQZ 

Problem 

k 

ita 

#it 

#mv 

#prec 

#it 

#mv 

#prec 

MHD4800 

1 

0 

20 

41 

40 

DNG 

DNG 

DNG 


3 

0 

15 

79 

76 

DNG 

DNG 

DNG 


5 

0 

12 

101 

96 

DNG 

DNG 

DNG 


10 

0 

9 

165 

155 

DNG 

DNG 

DNG 

LSTAB_NS 

1 

25 

5 

261 

260 

DNG 

DNG 

DNG 


3 

25 

49 

5307 

5304 

DNG 

DNG 

DNG 


5 

25 

28 

4893 

4888 

DNG 

DNG 

DNG 


10 

25 

36 

12984 

12974 

DNG 

DNG 

DNG 

UTM1700 

1 

15 

5 

161 

160 

MCV 

MCV 

MCV 


3 

15 

13 

963 

960 

16 

263 

280 


5 

15 

13 

1669 

1664 

31 

511 

545 


10 

15 

21 

5034 

5024 

166 

2691 

2865 


Table 5.6: Comparison of GPLHR and JDQZ algorithms for generalized eigenproblems. 


Table 5.6 contains the comparison results for generalized eigenproblems, where 
GPLHR is compared with the JDQZ algorithm. It reports the same quantities as in 
Table 5.5. Similarly, /^:prec corresponds to the number of preconditioner applications 
within the Hq GMRES iterations used as a GPLHR preconditioner T. In MHD4800, 
we set the GMRES preconditioner to {A — in LSTAB_NS to the ideal least 

squares commutator preconditioner, and to ILU(10“^) in UTM1700. 
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Table 5.6 shows that, for a given preconditioner and trial subspace size, JDQZ 
failed to converge for MHD4800 and LSTAB_NS. Slightly increasing Uq allowed us 
to restore the JDQZ convergence for MHD4800. However, even with a larger number 
of itc, the method misconverged, i.e., missed several wanted eigenpairs and instead 
returned the ones that are not among the k closest to a. For LSTAB_NS, we were 
not able to restore the convergence neither by increasing the number ito of GMRES 
iterations, nor by allowing a larger trial subspace. 


Ten eigenvalues of A15428 computed by GPLHR 


Ten eigenvalues of A15428 computed by JDQR 
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Fig. 5.4: Ten eigenvalues of A15428 closest to cr = —40.8710 (top) and twenty eigenvalues of 
MHD4800 (Alfven branch) closest to cr = —0.1 + 0.5i (bottom) computed by GPLHR (left) 
and JDQR/JDQZ (right) algorithms. 


We would like to emphasize the remarkable robustness of GPLHR. Given an 
appropriate preconditioner T, in all of our tests, the method consistently obtained the 
correct results, without convergence stagnation or misconvergence, as was observed in 
several JDQR/JDQZ runs (in Tables 5.5 and 5.6, “DNG” means that the method did 
not converge and “MGV” denotes misconvergence). The observed robustness is not 
surprising, as the block methods have traditionally been recommended for properly 
resolving eigenvalue clusters [30]. 

We have also noticed that JDQZ/JDQR are more likely to misconverge, i.e., dis¬ 
cover incorrect eigenvalues, when a larger number of eigenpairs (say, k > 10) is needed. 
Such situations are demonstrated in Figure 5.4, where fc = 10 eigenpairs of A15428 
(top) and fc = 20 eigenpairs of MHD4800 (bottom) closest to a are computed by dif¬ 
ferent eigensolvers. It can be seen that the GPLHR found all the targeted eigenvalues 
(within the dashed circle), whereas JDQR/JDQZ missed a few of them. This can 































THE GPLHR METHOD FOR NON-HERMITIAN EIGENPROBLEMS 


27 


be possibly caused by the repeated deflations performed by JDQR/JDQZ, which can 
accumulate error and lead to an inaccurate solution. As a GPLHR preconditioner T 
for A15428, we use an approximate solve of {A — aI)w = r by GMRES preconditioned 
with ILU(10“^) and convergence tolerance (toLprec) of 10“^. The GMRES solve with 
the same ILU(10“^) preconditioner is applied to the correction equation in JDQR. 
Eor the MHD4800 case, the GPLHR preconditioner is T = {A — which is also 

applied to the residuals in the JDQZ scheme to define corrections, without perform¬ 
ing GMRES steps. Note that adding inner GMRES iterations to solve the correction 
equation did not fix the JDQZ misconvergence issue. 

5.6. Comparison with ARPACK. In our last set of tests we compare GPLHR 
with the implicitly restarted Arnold! method available in ARPAGK through matlab’s 
eigs function. Here, we restrict attention to the top three examples in Table 5.1 
which are among the largest in our collection. The results for these problems are 
representative for large-scale eigenproblems. 

In order to compute k eigenvalues closest to the target shift a, ARPACK has to 
be supplied with a procedure to invert the shifted matrix A — aB, which represents a 
major challenge in this type of calculation. In essence, ARPACK applies the Arnoldi 
algorithm to the transformed matrix pair [{A—a'B)~^B, I), and therefore the inversion 
of A — aB should be performed to high accuracy to maintain exact Arnoldi relations. 

By contrast, GPLHR does not require any exact inverses and instead relies on 
preconditioning, given by T « (A — aB)~^, which, as we have observed, does not 
necessarily have to provide a very accurate approximation of the shift-and-invert op¬ 
erator. Our tests demonstrate that the savings obtained by GPLHR through avoiding 
the exact inverse of A — aB can be substantial. 

One way to invert A — aB is by performing its LU factorization. Then the 
matrix-vector products in ARPACK can be defined through triangular solves with 
the L and U factors. However, depending on the sparsity and structure, the full 
LU factorization can be costly, and is generally significantly more expensive than 
performing it incompletely in the context of ILU preconditioning. 


Problem 

k 

GPLHR 

ARPAGK 

A15428 

10 

335 

3006 

AF23560 

5 

2 

8 

CRYIOOOO 

5 

1.4 

1.9 


Table 5.7: Time required by GPLHR with the preconditioner T given by ILU(10“®) and 
ARPACK with the shift-and-invert operator computed through the LU decomposition of 
A — al to find k eigenvalues closest to a and their associated eigenvectors. 


Table 5.7 shows the difference in time required by GPLHR and ARPACK to com¬ 
pute a number of eigenvalues closest to a. Here, the shift-and-invert in ARPACK is 
performed by means of the LU factorization, whereas GPLHR only uses the incom¬ 
plete ILU factors as a preconditioner. One can see that, even for problems of the size 
of several tens of thousands, GPLHR can achieve almost 10 time speed up. Note that 
the results reported in Table 5.7 include time required to compute the full and in¬ 
complete triangular factors in ARPACK and GPLHR, respectively. All of our results 
were obtained on an iMac computer running Mac OS X 10.8.5, MATLAB R2013a, 
with a 3.4 GHz Intel Core i7 processor and 16GB 1600MHz DDR3 memory. 
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Problem 

k 

ita 

GPLHR 

ARPAGK 

A15428 

10 

7 

153 

196 

AF23560 

5 

5 

6 

131 

GRYIOOOO 

5 

5 

6 

95 


Table 5.8: Time required by GPLHR with the preconditioner T given by ita steps of pre¬ 
conditioned GMRES and ARPACK with shift-and-invert operator defined through a precon¬ 
ditioned GMRES solve of {A — al)w = r to full accuracy to compute k eigenvalues closest to 
a and their associated eigenvectors. In all cases, GMRES is preconditioned with ILU(10“^). 


Another way to perform shift-and-invert in ARPACK is through invoking an 
iterative linear solver to compute the solution of a system {A — aB)w = r to full 
accuracy. Instead, GPLHR can rely on a solve with a significantly more relaxed 
stopping criterion, which will play a role of the preconditioner T. The performance of 
ARPACK and GPLHR in such a setting is compared in Table 5.8, which reports time 
required by both solvers to obtain the solution. Here, the ARPACK’s shift-and-invert 
is performed using the full-accuracy GMRES solve, preconditioned by ILU(10“^). At 
the same time, the GPLHR preconditioner T is given by only a few {itc) GMRES 
steps with the same ILU(10“^) preconditioner. Once again, in these tests, we observe 
more than 10 time speedup when the eigensolution is performed using GPLHR. 

6. Conclusions. We have presented the Generalized Preconditioned Locally 
Harmonic Residual method (GPLHR) for computing eigenpairs of non-Hermitian reg¬ 
ular matrix pairs {A, B) that correspond to the eigenvalues closest to a given shift a. 
GPLHR is a block eigensolver, which takes advantage of a preconditioner when it is 
available, and does not require an exact or highly accurate shift-and-invert procedure. 
The method allows multiple matrix-vector product to be computed simultaneously, 
and can take advantage of BLASS through blocking. As a result, it is suitable for 
high performance computers with many processing units. 

Our numerical experiments demonstrated the robustness and reliability of the 
proposed algorithm. We compared GPLHR to several state-of-the-art eigensolvers 
(including block GD, JDQR/JDQZ, and implicitly restarted Arnold!) for a variety 
of eigenvalue problems coming from different applications. Our results show that 
GPLHR is competitive to the established approaches in general. It is often more 
efficient, especially if there is a limited amount of memory. 











THE GPLHR METHOD FOR NON-HERMITIAN EIGENPROBLEMS 


29 


Appendix A. The GPLHR-EIG algorithm. In Algorithm 3, we present an 
eigenvector-based version of the GPLHR algorithm, called GPLHR-EIG. 


Algorithm 3: The GPLHR-EIG algorithm for partial eigendecomposition (2.2) 

Input: A pair {A,B) of n-by-n matrices, shift ct £ C different from any 

eigenvalue of {A,B), preconditioner T, starting guess of eigenvectors 
^( 0 ) ^ and the subspace expansion parameter m. 

Output: If H 7 ^ 1, then approximate eigenvectors X £ 

sociated diagonal matrices Aa,Ab £ in (2.2), such that 

\j = AaH, j) / Ab^J, j) are the k eigenvalues of {A,B) closest to a. 

If B = 7, then approximate eigenvectors X £ and the associated 

diagonal matrix A£C*^^*^ofA: eigenvalues of A closest to a. 

1 : X ^ g ^ (A - aB)X- P ^ [ ]; 

2: Normalize columns of X {xj ^ ||); Aa ■(— diag{X*AX}; As diag{X*i3X}; 

3: while convergence not reached do 
4: P •«— orth(X); Q •«— orth(g); 

5: if 73 7 ^ / then 

6: W^{I-VV*)T{I-QQ*){AXAb-BXAa); 

7: else 

8: W ^ {I-VV*)T{I-VV*){AXAb - XAa)-, 

9: end if 

10: IT ^ orth(lP); So ^ VP; [ ]; 

11: for 1 = 1 —>■ m do 

12: if 73 7 ^ 7 then 

13: Si ^ (7 - VV*)TiI - gg*)(ASi_iAs - BSi-iAa)-, 

14: else 

15: Si ^ (7 - VV*)T{I - VV*){ASi-iAb - Si-iAa); 

16: end if 

17: Si ^ Si - lP(lT*Si); Si ^ Si - S(S*Si); Si ^ orth(Si); S ^ [S Si]; 

18: end for 

19: p^p- v{y*py, p^ p- w{w*p)-, p^p- s{s*py p ^ orth(P); 

20: Set the trial subspace Z [V, W, Si,..., Sm, P]', 

21: Q orth((A — ct73)[VP, Si, ..., Sm, P])’, Q ■(—Q — Q{Q*Q)', 

22: Set the test subspace U [g,Q]; 

23: Solve the projected eigenproblem {U*AZ,U*BZ); set Y to be the matrix of all 

eigenvectors ordered according to their eigenvalues’ closeness to a, ascendingly; 

24: T^y(:.l:k); 

25: P ^ WYw + SiYs, +... + SmYs^ + PYp, where Y = \Y^, Y^,Yi^,..., YJ^ , Y^f 

is a conforming partition of Y ; 

26: A ^ VYv + P;Q^(A- aB)X; 

27: P ^ ZY(: ,k+l:2k));‘^ 

28: Normalize columns of X ; Aa ^ diag{A*AA}; As diag{A*73A}; 

29: end while 

30: If 73 = 7, then A A. 4 . 


“This step can be disabled if the LOBPCG-style search direction of step 25 is preferred. 


The work flow of Algorithm 3 is nearly identical to that of the Schur vector 
based GPLHR in Algorithm 2. The main algorithmic difference is that GPLHR- 
EIG discards the harmonic Ritz values computed in step 23 and, instead, defines 
eigenvalue approximations using bilinear forms aj = x*Axj and j3j = x*Bxj, 








30 


EUGENE VECHARYNSKI, CHAO YANG AND FEI XUE 


where Xj are the columns of X at a given iteration (j = 1,..., /c). Clearly, with this 
choice, the ratios cxjll^j are exactly the Rayleigh quotients for (A, B) evaluated at the 
corresponding harmonic Ritz vectors. This approach is motivated by the fact that 
the Rayleigh quotients represent optimal eigenvalue approximations, and is common 
in the harmonic Rayleigh-Ritz based eigenvalue computations; e.g., [26, 29, 46]. 

The complexity and memory requirements of Algorithm 3 are comparable to those 
of Algorithm 2. Note that it is not necessary to keep both the approximate eigenvec¬ 
tors X and the orthonormal basis R, since X can be rewritten with V. Therefore, no 
additional memory for storing V is needed in practice. 

Although in the numerical experiments of Section 5 we report only the results for 
the Schur vector based variant of GPLHR (Algorithm 2), the performance of GPLHR- 
EIG was found to be similar for the problems under considerations. Therefore, we do 
not report results for Algorithm 3. Instead we refer the reader to [50], where a version 
of GPLHR-EIG has been benchmarked for a specific application. 

Finally, let us remark that GPLHR-EIG is not quite suitable for computing larger 
subsets of eigenpairs using the deflation technique. In contrast to the Schur vector 
based GPLHR, the computed set of eigenvectors cannot be directly used for defla¬ 
tion. Thus, Algorithm 3 should be invoked in situations where the number of desired 
eigenpairs k is sufficiently small to ensure their efficient computation in a single run 
of the algorithm with the block size (at least) k. 
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